{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "71a329f0",
   "metadata": {},
   "source": [
    "# LLAMA 7B with customized preprocessing\n",
    "In this tutorial, you will use LMI container from DLC to SageMaker and run inference with it.\n",
    "\n",
    "Please make sure the following permission granted before running the notebook:\n",
    "\n",
    "- S3 bucket push access\n",
    "- SageMaker access\n",
    "\n",
    "## Step 1: Let's bump up SageMaker and import stuff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67fa3208",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install sagemaker --upgrade  --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec9ac353",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "from sagemaker import Model, image_uris, serializers, deserializers\n",
    "\n",
    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
    "region = sess._region_name  # region name of the current SageMaker Studio environment\n",
    "account_id = sess.account_id()  # account_id of the current SageMaker Studio environment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81deac79",
   "metadata": {},
   "source": [
    "## Step 2: Start preparing model artifacts\n",
    "In LMI contianer, we expect some artifacts to help setting up the model\n",
    "- serving.properties (required): Defines the model server settings\n",
    "- model.py (optional): A python file to define the core inference logic\n",
    "- requirements.txt (optional): Any additional pip wheel need to install"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b011bf5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile serving.properties\n",
    "engine=MPI\n",
    "option.model_id=openlm-research/open_llama_7b\n",
    "option.task=text-generation\n",
    "option.trust_remote_code=true\n",
    "option.tensor_parallel_degree=1\n",
    "option.max_rolling_batch_size=32\n",
    "option.rolling_batch=lmi-dist\n",
    "option.dtype=fp16"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a0ab053",
   "metadata": {},
   "source": [
    "In this step, we will try to override the [default HuggingFace handler](https://github.com/deepjavalibrary/djl-serving/blob/0.27.0-dlc/engines/python/setup/djl_python/huggingface.py) provided by DJLServing. We will add an extra parameter checker called `password` to see if password is correct in the payload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19d6798b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile model.py\n",
    "from djl_python.huggingface import HuggingFaceService\n",
    "from djl_python import Output\n",
    "from djl_python.encode_decode import encode, decode\n",
    "from djl_python.input_parser import input_formatter\n",
    "from djl_python.request_io import TextInput\n",
    "import logging\n",
    "import json\n",
    "import types\n",
    "\n",
    "_service = HuggingFaceService()\n",
    "\n",
    "@input_formatter\n",
    "def custom_input_formatter(self, input_item: Input, **kwargs) -> TextInput:\n",
    "    \"\"\"\n",
    "    Replace this function with your custom input formatter.\n",
    "        \n",
    "    Args:\n",
    "        data (obj): The request data, dict or string  \n",
    "\n",
    "    Returns:\n",
    "        (tuple): input_data (list), input_size (list), parameters (dict), errors (dict), batch (list)\n",
    "    \"\"\"\n",
    "    content_type = input_item.get_property(\"Content-Type\")\n",
    "    input_map = decode(input_item, content_type)\n",
    "    \n",
    "    inputs = input_map.pop(\"inputs\", input_map)\n",
    "    password = input_map.pop(\"password\", \"\")\n",
    "    # password checker\n",
    "    if inputs != [\"\"] and password != \"12345\":\n",
    "        raise ValueError(\"Incorrect password!\")\n",
    "    \n",
    "    request_input = TextInput()\n",
    "    request_input.input_text = input_map.pop(\"prompt\", input_map)\n",
    "    request_input.parameters = input_map.pop(\"parameters\", {})\n",
    "    \n",
    "    return request_input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0142973",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sh\n",
    "mkdir mymodel\n",
    "mv serving.properties mymodel/\n",
    "mv model.py mymodel/\n",
    "tar czvf mymodel.tar.gz mymodel/\n",
    "rm -rf mymodel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e58cf33",
   "metadata": {},
   "source": [
    "## Step 3: Start building SageMaker endpoint\n",
    "In this step, we will build SageMaker endpoint from scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d955679",
   "metadata": {},
   "source": [
    "### Getting the container image URI\n",
    "\n",
    "[Large Model Inference available DLC](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a174b36",
   "metadata": {},
   "outputs": [],
   "source": [
    "image_uri = image_uris.retrieve(\n",
    "        framework=\"djl-lmi\",\n",
    "        region=sess.boto_session.region_name,\n",
    "        version=\"0.30.0\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11601839",
   "metadata": {},
   "source": [
    "### Upload artifact on S3 and create SageMaker model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38b1e5ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_code_prefix = \"large-model-lmi/code\"\n",
    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
    "code_artifact = sess.upload_data(\"mymodel.tar.gz\", bucket, s3_code_prefix)\n",
    "print(f\"S3 Code or Model tar ball uploaded to --- > {code_artifact}\")\n",
    "\n",
    "model = Model(image_uri=image_uri, model_data=code_artifact, role=role)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "004f39f6",
   "metadata": {},
   "source": [
    "### 4.2 Create SageMaker endpoint\n",
    "\n",
    "You need to specify the instance to use and endpoint names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e0e61cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "instance_type = \"ml.g5.2xlarge\"\n",
    "endpoint_name = sagemaker.utils.name_from_base(\"lmi-model\")\n",
    "\n",
    "model.deploy(initial_instance_count=1,\n",
    "             instance_type=instance_type,\n",
    "             endpoint_name=endpoint_name,\n",
    "             # container_startup_health_check_timeout=3600\n",
    "            )\n",
    "\n",
    "# our requests and responses will be in json format so we specify the serializer and the deserializer\n",
    "predictor = sagemaker.Predictor(\n",
    "    endpoint_name=endpoint_name,\n",
    "    sagemaker_session=sess,\n",
    "    serializer=serializers.JSONSerializer(),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb63ee65",
   "metadata": {},
   "source": [
    "## Step 5: Test and benchmark the inference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79786708",
   "metadata": {},
   "source": [
    "Firstly let's try to run with a wrong inputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2bcef095",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.predict(\n",
    "    {\"inputs\": \"Large model inference is\", \"parameters\": {}}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24d6bf6d",
   "metadata": {},
   "source": [
    "Then let's run with the right one"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9789399",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.predict(\n",
    "    {\"inputs\": \"Large model inference is\", \"parameters\": {}, \"password\": \"12345\"}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1cd9042",
   "metadata": {},
   "source": [
    "## Clean up the environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d674b41",
   "metadata": {},
   "outputs": [],
   "source": [
    "sess.delete_endpoint(endpoint_name)\n",
    "sess.delete_endpoint_config(endpoint_name)\n",
    "model.delete_model()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
